Improving the Decision Value of Hierarchical Text Clustering Using Term Overlap Detection
نویسندگان
چکیده
منابع مشابه
A New Method for Duplicate Detection Using Hierarchical Clustering of Records
Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...
متن کاملRefactorings Detection Using Hierarchical Clustering
Refactoring is a process that helps to maintain the internal software quality, during the whole software lifecycle. This paper aims at introducing a new hierarchical clustering algorithm that can be used for improving software systems design, by identifying the appropriate refactorings. The algorithm is named HARD (Hierarchical Clustering Algorithm for Refactorings Determination) and uses a new...
متن کاملImproving Text Search Process using Text Document Clustering Approach
Knowledge discovery and data mining is a process of retrieving the meaningful knowledge from the raw data, using different techniques. Therefore, text mining is a sub domain of knowledge discovery from the text data. This paper provides a different way of understanding the text mining and their applications in different real time applications. This paper also includes the design of a hybrid tex...
متن کاملImproving the accuracy of co-citation clustering using full text
Historically, co-citation models have been based only on bibliographic information. Full text analysis offers the opportunity to significantly improve the quality of the signals upon which these co-citation models are based. In this work we study the effect of reference proximity on the accuracy of co-citation clusters. Using a corpus of 270,521 full text documents from 2007, we compare the res...
متن کاملHierarchical clustering of large text datasets using Locality-Sensitive Hashing
In this paper, we present a hierarchical clustering algorithm of the large text datasets using Locality-Sensitive Hashing (LSH). The main idea of the LSH is to “hash” items several times, in such a way that similar items are more likely to be hashed to the same bucket than dissimilar are. The main drawback of the conventional hierarchical algorithms is a large time complexity (e.g. Single Linka...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Australasian Journal of Information Systems
سال: 2015
ISSN: 1449-8618,1449-8618
DOI: 10.3127/ajis.v19i0.1180